Picture for Zirui Wang

Zirui Wang

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Add code
Jan 23, 2026
Viaarxiv icon

Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs

Add code
Jan 20, 2026
Viaarxiv icon

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Figure 1 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 2 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 3 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 4 for FrontierCS: Evolving Challenges for Evolving Intelligence
Viaarxiv icon

Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots

Add code
Nov 12, 2025
Figure 1 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Figure 2 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Figure 3 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Figure 4 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Viaarxiv icon

Towards Adaptable Humanoid Control via Adaptive Motion Tracking

Add code
Oct 16, 2025
Viaarxiv icon

COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization

Add code
Oct 08, 2025
Figure 1 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 2 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 3 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 4 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Figure 1 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 2 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 3 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 4 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Viaarxiv icon

UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots

Add code
Jul 10, 2025
Viaarxiv icon

Active View Selector: Fast and Accurate Active View Selection with Cross Reference Image Quality Assessment

Add code
Jun 24, 2025
Viaarxiv icon